Sequence-Structure Patterns: Discovery and Applications

نویسندگان

  • T. Milledge
  • S. Khuri
  • X. Wei
  • C. Yang
  • G. Zheng
  • G. Narasimhan
چکیده

Protein sequence data is being generated at a tremendous rate; however, functional annotation of these proteins is proceeding at a much slower pace. Biologists rely on computational biology and pattern recognition to predict the functionality of proteins. This is based on the fact that proteins that share a similar function often exhibit conserved sequence patterns. Such sequence patterns, or motifs, are derived from multiple sequence alignments and have been collected in databases such as PROSITE, PRINTS, SPAT, and eMOTIF. These patterns help to classify proteins into families where the exact function may or may not be known. Research has shown that these domain signatures often exhibit specific threedimensional structures. In this paper, we show how starting from a seed sequence pattern from any of the existing sequence pattern databases, and using information from the protein structure databases, it is possible to design biologically meaningful sequencestructure patterns (SSPs). An important by-product of our method to generate sequence-structure patterns is an improved sequence alignment as well as an improved structural alignment of proteins belonging to a family and containing that pattern. Validation was performed by matching the resulting SSPs to domains in the ASTRAL compendium associated with a family or super-family designation in the SCOP database. SSPs generated by this method were frequently either fully specific (no false positives), fully sensitive (no false negatives), or both (diagnostic).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Exploring the Frequent Patterns based on Activities Sequence

In recent years, the development of the use of location-based tools has made it possible to produce geometric trajectories from the user's movement paths. In this way, users' goal of traveling and related activities can be considered in addition to the geometry and route shape. the user activity trajectory represents the sequence of the visited activities and its related analysis as presented i...

متن کامل

Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points

Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...

متن کامل

Query Driven Sequence Pattern Mining

The discovery of frequent patterns present in biological sequences has a large number of applications, ranging from classification, clustering and understanding sequence structure and function. This paper presents an algorithm that discovers frequent sequence patterns (motifs) present in a query sequence in respect to a database of sequences. The query is used to guide the mining process and th...

متن کامل

Discovery of Novel Peptidomimetics for Brain-Derived Neurotrophic Factor using Phage Display Technology

Brain-Derived Neurotrophic Factor (BDNF) is a neuroprotectant candidate for neurodegenerative diseases. However, there are several clinical concerns about its therapeutic applications. In the current study, we selected BDNF-mimicking small peptides from phage-displayed peptide library as alternative molecules to the clinical challenges. The peptide library was screened against BDNF receptor (Ne...

متن کامل

MULTIDIMENSIONAL LONGEST COMMON SUBSEQUENCE DISCOVERY From LARGE DATABASE USING DNA OPERATIONS

The problem of analysis of biological sequences, is the discovery of sequence similarity of various kinds, in the primary structure of related proteins and genes. This sequence search can be applied to various applications like discovery of association rules, strong rules, correlations, sequential rules, frequent episodes, multidimensional patterns and many other important discovery tasks. In t...

متن کامل

Algorithms for pattern matching and discovery in RNA secondary structure

Text-indexing structures provide significant advantages in the solution of many problems related to string analysis and comparison, and are nowadays widely used in the analysis of biological sequences. In this paper, we present some applications of affix trees to problems of exact and approximate pattern matching and discovery in RNA sequences. By allowing bidirectional search for symmetric pat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005